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Abstract 

A  new  identification  criterion,  motivated  by  notions  of  successively  improving  approxima¬ 
tions  in  the  philosophy  of  science,  is  defined.  It  is  6hown  that  the  class  of  recursive  functions 
is  identifiable  under  this  criterion.  This  result  is  extended  to  permit  somewhat  more  realis¬ 
tic  types  of  data  than  usual.  This  criterion  is  then  modified  to  consider  restrictions  on  the 
quality  of  approximations. 


f  OTIC  \ 

\ 

copy  1 
,  mspectep  / 


STATEMENT  "A"  per  D. Hughes 
ONR/Code  11SP 

TELECON  5/18/90 


Accos'oe  i'-jr 

N  MS  Cff'-Ai 
f;T(C  TAii 

d 

Jm.titicat,.,., 

oy 

DisH.b  ■  1 1  j ; i  f 


Availability  Codes 
I  A v j 1 1  ,i"d.'ur 


l/W 


i 


1  Introduction 


Research  in  inductive  inference  has  historically  been  motivated  bv  considerations  of  the 
philosophy  of  science,  e.g.  [CS83].  However,  the  criteria  of  success  so  far  proposed  seem 
unrealistic  for  science. 

In  the  sequel  we  assume  that  scientific  experiments  and  observations  are  encoded  as 
natural  numbers  and  that  the  process  of  scientific  theory  formation  can  be  modeled  by  an 
algorithmic  device  which  operates  on  the  encoded  experiments  and  observations. 

Gold’s  [Gol67]  criterion  demands  that  an  inductive  inference  machine  produce  a  final 
correct  program  (in  the  sense  that  it  correctly  computes  the  input  function  or  set);  others 
[CS83]  have  liberalized  that  criterion  to  allow  final  programs  that  are  correct  except  on 
finitely  many  inputs.  [Bar74]  and  [CS83]  also  give  criteria  that  permit  infinite  sequences  of 
programs,  nearly  all  of  which  are  (perhaps  only  nearly)  correct. 

We  hold  with  Peirce  [Pei58]  that  science  cannot  be  expected  to  produce  a  final  theory 
of  anything,  nor  even  a  cofinal  sequence  of  nearly  correct  theories.  Instead,  the  best  we 
can  hope  for  is  that  science  produces  an  infinite  sequence  of  improving  approximations  to 
reality.  It  is  the  point  of  this  paper  to  give  precision  to  this  notion. 

2  Notations 

A'  is  the  set  of  natural  numbers.  i,j.l,  m,n,x,y,z  and  variously  decorated  versions  thereof 
range  over  natural  numbers  unless  otherwise  specified,  d  ranges  over  the  real  interv<d  [G..1]. 
f,g  range  over  functions  from  Ar  to  .  0  denotes  the  null  set.  Card(S)  denotes  the 
cardinality  of  the  set  5.  max,  min  denote  the  maximum  and  minimum  of  a  set  respectively. 
/xz[Q(x)]  is  the  least  natural  number  x  such  that  Q(x )  is  true  (if  such  exists). 

TZ  denotes  the  class  of  recursive  functions,  y?  denotes  an  acceptable  numbering  [Rog58], 
[Rog67].  $  denotes  an  arbitrary  Blum  complexity  measure  [Blu67]  for  y>.  (•,•)  stands 
for  an  arbitrary  computable  one  to  one  encoding  of  all  pairs  of  natural  numbers  onto  N 
[Rog67].  ttj  and  7 r2  are  the  corresponding  projection  functions.  (•,•)  is  extended  to  n- 
tuples  in  the  usual  way.  For  any  two  partial  functions  r?i  and  tj2,  t?i  ="  r\2  means  that 
card({x|771(x)  ^  %(*)})  <  n.  =*  rj2  means  that  card({x|r/i(r)  ^  772(1)})  is  finite.  For 
any  two  sets  5i  and  S2,  5i  =n  S2  denotes  card((5j  -  S2)U(S2  -  Si))  <  n.  Si  =*  S2  denotes 
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card((Si  -  S2)  U  (52  -  5 1))  is  finite.  5 1  ©  52  denotes  the  symmetric  difference  of  the  sets 
5 1  and  5 2. 

Sequences  are  functions  with  domain  an  initial  segment  of  Ar.  An  information  se¬ 
quence  is  an  infinite  sequence;  a  segment  is  a  finite  sequence,  t  ranges  over  information 
sequences,  a,  <7o,..  range  over  segments.  tn  denotes  the  initial  segment  of  t  with  length 
n.  content(t)  =  range(t)  -  {*};  intuitively  it  is  a  set  of  meaningful  things  presented  in  t. 
Similarly,  content(cr)  =  range(cr)  —  {*}.  length(cr)  denotes  the  length  of  a.  o2  oa2  denotes 
the  concatenation  of  c\  and  For  all  recursive  functions  /,  / 1"  denotes  the  finite  segment 
(«0,/(0))),((l,/(l))),((2,/(2))),....((n,/(n)»). 

M,Mo _ denote  IIMs.  M(er)  is  the  last  output  of  M  after  receiving  input  a  (note 

OC  . _ 

that  o  can  be  encoded  as  a  natural  number).  M(t)  1=  i  iff  (V  n)[M(f„)  =  i].  We  write 
M(f)  1  iff  (3t)[M(f)  ]=  i].  Any  unexplained  notation  is  from  [Rog67]. 

3  Preliminaries 

In  this  section  we  briefly  discuss  notions  from  recursion  theoretic  machine  learning  literature. 
For  detailed  discussion  see  [OSW86;  CS83;  Gol67;  AS83;  KW80;  BB75]. 

Inductive  Inference  machines  (IIMs)  have  been  used  in  the  study  of  identification  of 
recursive  functions  as  well  as  recursively  enumerable  languages  [Gol67;  CS83;  CL82;  AS83; 
KW80;  Ful85;  BB75].  For  function  learning  the  input  sequence  given  to  the  IIM  is 
(0,  /(0)),  (1,  /( 1)), —  where  /  is  the  function  being  learned.  A  criterion  of  success  (called 
Ex-identification)  is  for  the  machine  to  eventually  output  a  last  program,  which  computes 
(nearly  computes)  /.  Formally. 

Definition  1  [Gol67;  BB75;  CS83]  M  Ex°  -identifies  f  (written  /  €  Ex°(M))  iff  both 
M(/)  i  and  y>M(/)  =°  /. 

Definition  2  [Gol67;  BB75;  CS83]  Ex°  =  {5  C  ft|(3M)[5  C  Ex°(M)]}. 

In  the  above  definitions  a  stands  for  the  number  of  anomalies  allowed  in  the  final  pro¬ 
gram.  a  =  *  means  that  unbounded  but  finite  number  of  anomalies  is  allowed  in  the  final 
program.  Case  and  Smith  [CS83]  introduced  another  infinite  hierarchy  of  identification 
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criterion  which  we  describe  below.  “BC”  stands  for  behaviorally  correct.  Barzdin  [Bar74] 
independently  introduced  a  similar  notion. 

Definition  3  [CS83]  M  BC°-identifies  /  (written:  /  €  BCa(M))  iff,  M  fed  /  outputs 

OO 

over  time  an  infinite  sequence  of  programs  Po,Pi,P2i  •  •  •  such  that  (V  n)[<pPn  =°  /]. 

Definition  4  [CS83]  BCa  =  {S  C  ft|(3M)[S  C  BC°(M)]}. 

We  usually  write  Ex  for  Ex0,  TxtEx  for  TxtEx0,  BC  for  BC°,  and  TrtBC  for 
TxtBC0. 

4  Approximation  of  recursive  functions 

Definition  5  M  Ap-identifies  f  (written:  /  6  Ap(M))  iff  there  is  a  sequence  of  sets 
Sn  C  A'  such  that: 

i)  for  all  n.  M(/„)  correctly  computes  /  on  all  x  G  S„: 

ii)  for  all  n,  5„  C  5n+! ; 

iii)  for  all  x  there  is  n  with  x  €  5„; 

iv)  there  are  infinitely  many  n  with  •Sn+l  —  Sn  infinite. 

If  M  Ap-identifies  /,  then  we  also  say  that  M  approximates  f. 

Theorem  1  There  is  an  inductive  inference  machine  M  that  approximates  every  recursive 
function. 

Proof:  The  idea  is  that  M  partitions  Af  into  infinitely  many  infinite  subsets;  M  then 
carries  out  a  separate  induction  process  for  each  subset.  In  the  inference  process  for  one 
subset,  M  uses  the  number  of  the  subset  as  a  bound  on  the  Godel  number  of  programs  to 
be  considered. 

When  M(/  |n)  is  run  on  an  input  x  >  n  from  the  t-th  element  of  the  partition,  it  uses 
the  program  less  than  »  that  best  fits  /  |n,  where  n  is  used  to  bound  the  computation  of 
the  degree  of  fit.  For  inputs  x  <  n,  M (/  |n)  outputs  /(x),  which  is  in  the  input. 

Let  patch,  select,  err,  and  best  be  recursive  functions  such  that: 
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f  y,  if  (3y)(i,  y)  £  content(o), 
<^patch(i,<r){:r)  =  |  (pt(x)  otherwise; 


Y>ai((i,x)),  if  »'  <-  «; 

otherwise; 

®rr(j, <t)  =  pa:[x  =  j<r|  or  $j(z)  >  \a\  or  <p:(x)  ^  <r(ar)]; 


^Select(n,{ao,...,an))^ 


best(6,  a)  =  pi.[t  <  6  A  (Vj  <  6)[err(t,a)  >  err(j,  a)]). 


patch  patches  cr  into  program  i.  select  chooses  one  of  ao,  ...,an  and  runs  it  according  to 
the  left  projection  of  the  input,  err  finds  the  first  apparent  error  committed  by  j  relative 
to  a.  best  finds  the  best  program  (as  measured  by  err)  for  a  and  less  than  b. 

Let  M(cr)  be  patch(a,  se/ect(|oj,  (6est(0,  a), 6esf(|cr|,  cr)))). 

Note  that  for  any  sufficiently  large  i  and  n.  best(i,f  |")  will  be  a  program  for  /.  We 
may  take  5„  =  {0,...,n  -  1}  U(fc|*<nA(v«»>»)Vbest  .b,  -/)({(*>*)  I  1  €  A'})-  Verification  of 
the  properties  (i-iv)  in  Definition  (5)  is  immediate.  □ 


5  Density  Restrictions  on  Ap-criterion 

The  algorithm  given  in  the  previous  section  Ap-identifies  all  the  recursive  functions.  It  is 
easy  to  choose  pairing  function  such  that  for  all  recursive  function  /  the  limiting  density  of 
the  sets  Sn  is  positive.  In  this  section  we  study  the  effects  of  requiring  the  limiting  density 
of  the  sets  So.  5j, . . .  so  formed  to  be  above  a  certain  prespecified  value.  First  we  formally 
define  what  we  mean  by  density  of  a  set. 

Definition  6  [Roy86]  The  density  oi  a  set  A  C  A' in  a  finite  and  nonempty  set  B  (denoted: 
d (A;B))  is  card(A f|£)/card(£). 

Intuitively,  d (A;B)  can  be  thought  of  as  the  probability  of  selecting  an  element  of  A 
when  choosing  an  arbitrary  element  from  B. 

Definition  7  [Roy86]  The  density  of  a  set  A  C  Af  (denoted:  d(A))  is  lim,,-.^  inf{d(A;  {?  j 
z  <  i})  |  x  >  tj}. 
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Definition  8  Let  d  £  [0,1].  An  IIM  M  DApd -identifies  a  function  /  (written:  /  £ 
DApd(M))  iff  there  exists  a  sequence  of  sets  S„  C  N  such  that: 

(i)  for  all  n,  M(/„)  correctly  computes  /  on  all  x  £  5„; 

(ii)  for  all  n,  S„  C  5„+i; 

(iii)  for  all  x  there  is  n  with  x  £  Sn\ 

(iv)  there  are  infinitely  many  n  with  Sn+\  —  Sn  infinite; 

(v)  /imri_00d(5n)  >  d. 

Definition  9  Let  d  £  [0, 1].  DAp*  =  {C  C  %  |  (3M)[C  C  DApd(M)]} 

Even  though  the  limning  density  of  a  set  may  be  1,  there  may  be  arbitrarily  large  gaps. 
We  thus  introduce  another  form  of  identification  which  prohibits  such  large  gaps. 

Definition  10  [Rov86]  The  uniform  density  of  a  set  A  in  intervals  of  length  >  n  (denoted: 
ud„(A))  is  inf({d(  A;  {2  \  x  <  z  <  y})  |  x,y  £  M  and  y  -  x  >  n}).  Uniform  density  of  A 
(denoted:  ud(A))  is  lim„_oo  udn(A). 

Definition  11  Let  d  £  [0,1].  An  IIM  M  UDApd -identifies  a  function  /  (written:  /  € 
UDApd(M))  iff  there  exists  a  sequence  of  sets  S„  C  M  such  that: 

(i)  for  all  72 ,  M(/n)  correctly  computes  /  on  all  x  €  5n; 

(ii)  for  all  n.  Sn  C  Sn+i ; 

(iii)  for  all  x  there  is  n  with  x  €  5„; 

(iv)  there  are  infinitely  many  72  with  5n+i  —  Sn  infinite; 

(v)  limn^^udiSn)  >  d. 

Definition  12  Let  d  £  [0.  lj.  UDAprf  =  {C  CTZ\  (3M)[C  C  UDApd(M)]} 

The  following  theorem  shows  that  no  machine  can  identify  all  of  the  recursive  functions 
if  the  sequence  of  sets  Sn  is  required  to  have  limiting  density  >  t  >  0. 

Theorem  2  (Vd.  0  <  d  <  l)[7v  £  DApd] 


5 


Proof:  We  prove  the  theorem  for  d  >  1/3.  Proof  can  be  generalized  to  the  case  when 
d  >  l/n,n  €  A\  Suppose  by  way  of  contradiction  that  IIM  M  DAp1/,3+'-identifies  72, 
c  >  0.  Then  by  ORT(Cas74]  there  exists  a  recursive  1-1  p,  such  that  the  following  holds. 
Let  <p*  denote  y?e  defined  before  stage  s.  Let  <£>p(o)(0)  =  0.  Let  i,  denote  the  maximum  x 
such  that  <fp(0)(x)  is  defined  before  stage  s.  Go  to  stage  1. 


Stage  s 


Dovetail  steps  1  and  2. 

If  1  succeeds  go  to  step  3. 

1.  Search  for 

la.  /, m.  /  >  xa.m  >  //(c/100)  ; 

lb.  yz,+i.yIs+2^  ■ -,yi,  and 

lc.  u- i.u'2, . . .  Wk-  /  <  u’i  <  m,  k  >  m  ■  (2/3  -I-  c/2) 
such  that 

(Vi.  1  <  j  <  A')[M(v'p(o)|Tj  o  (xs  +  l.Vx,+\)  o---o  {l,yi))(u-t)  J] 

2.  Let  ssp(i)(x)  =  <^p(0)(x)  for  x  <  a:,. 

Let  Zj,s'  denote  the  maximum  x  such  that  9p(s)(x)  is 
defined  before  substage  s'. 

Go  to  sub=tag<3  0. 
substage  s' 

2.1  Search  for 

2.1a.  l.m.  I  >  x,  s',m  >  //(c/100)  ; 

2-lb-  yr,y+uyx,',,+2,--->y:i  and 

2.1c.  u'i,  u'2, . . .  u’k,  l  <  Wi  <  m,  k  >  m  •  ( 1/3  -f  c/2) 

such  that 

(Vi,  1  <  i  <  k)[ M(<pp{s)\Z‘y  o(x,y  +  1,  Vr,  ,,+  l  )  o  •  •  •  O  (/,  yi)){v-'i)  1] 

2.2.  If  and  when  such  l,m  •••  are  found  let, 

2-2-a  ‘fp(s){l)  =  y X  for  Xsy  <  X  <  /, 

2.2.  b  Let  e  =  M(9p(j)|x*.*'  o  (x,y  +  l,y,<v+1)o  •••o(/,y,)); 

Let  V- p( s) ( ^* )  =  <fe(x)  +  1  for  x  €  {tn,|0  <  i  <  k}. 

2.2. c  Let  y:p{3)(x)  =  0  if  x  <  m  and  y?p(3)(x)  has  not  been  defined  till  now. 
Go  to  substage  s'  +  1 

End  substage  s'. 
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3.  If  and  when  1  succeds  let  /,  m  •  •  •  be  as  found  in  1.  Let, 

3.1.  a  ^p(o)(x)  =  yx  for  x,  <  x  <  l, 

3.1. b  Let  e  =  M(v>p(o)|I*  o  (x,  +  1, yx.+\)  o  •  •  •  o  (/,  y/)): 

Let  <rp(o)(x )  =  </>e(x)  +  1  for  x  €  {u.’,|0  <  i  <  k). 

3-l.c  Let  <fp(o)(x)  =  0  if  x  <  m  and  <pp(0)(x)  has  not  been  defined  till  now. 

Go  to  stage  5  +  1 
End  stage  s. 


Now  consider  the  following  cases: 
case  1:  There  are  infinitely  many  stages. 

In  this  case  let  /  =  <fip(o)-  Clearly  /  is  recursive.  We  claim  that  no  So.  Si,  •  •  •  can  exist 
satisfying  (i-v)  in  definition  (8)  for  /  and  d  =  1/3  +  c.  Suppose  otherwise.  Then  there  exists 
m.n2  such  that  d(S„,)  >  1/3  +  60/100-e.  Also  for  all  x  >  n2.  d(5„, :  [0..x] )  >  l/3  +  55/100c. 
But  then  in  all  the  stages  greater  than  max(n1,n2)  due  to  step  3  and  the  way 
were  chosen  we  have  that  there  exists  an  error  point  for  M(/|l)  in  5/  (since  /  >  max{n i,n2) 
and  the  fraction  of  points  upto  m  on  which  M(/|r)  commits  error  is  at  least  2/3  +  c/2). 
This  contradicts  (i)  in  definition  (8).  Thus  M  does  not  DAp1/,3+t-identifv  /. 

case  2:  Stage  s  never  halts.  In  this  case,  M  does  not  output,  a  program  which  has  a  domain 
of  limiting  density  more  than  2/3  +  51/100  •  e,  on  any  extension  of  y+(o)- 

case  2.1:  In  stage  s  there  are  infinitely  many  substages. 

In  this  case  let  /  =  y?p(,).  Since  on  /  M  never  outputs  a  program  which  has  a  domain 
of  limiting  density  more  than  2/3  +  51/100  •  c.  arguing  as  in  case  1  we  have  that  M  does 
not  DAp1,/3+t-identify  /. 

case  2.2:  In  stage  s  stage  s'  never  halts.  In  this  case  on  no  extension  of  <rp(j)|r*'*'  d°es 
M  output  a  program  which  has  domain  of  limiting  density  1/3  +  51/100e.  Let  /  be  any 
recursive  function  which  is  an  extension  of  Then  M  does  not  DAp1/^3+t-identify 

/• 

From  the  above  cases  we  have  that  M  does  not  DAp1,/3+c-identify  71. 

When  d  >  1/n  the  above  proof  can  be  generalized  by  taking  n  -  1  levels  of  iteration 
instead  of  2  as  done  in  the  above  procedure.  We  leave  the  details  to  the  reader.  □. 
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Theorem  3  (Vd,,d2  |  0  <  dx  <  d2  <  l)[UDApd>  -  DAp*2  ^  0] 

Proof:  Without  loss  of  generality  let  dx  -  m/n  and  =  {m  +  1  )/n.  Let 

C  =  {/  6  7v|j  <  m  mod  n  =>  f(x)  =  0}.  An  easy  modification  of  the  procedure  to 
Ap  identify  all  the  recursive  functions  given  in  the  previous  section,  gives  us  a  procedure 
to  Ap-identify  C  with  5o  =  {r  |  i  <  m  mod  n}.  Thus  C  €  UDApdl.  Suppose  by 
way  of  contradiction  that  IIM  M  DApd2-identifies  C.  Then  M  can  be  easily  modified  to 
DAp1^T1"rn*  identify  all  the  recursive  functions,  contradicting  Theorem  (2).  □ 

Theorem  4  (Vd.O  <  d  <  l^DAp1  -  UDApd  ^  0] 

Proof:  Let  C  =  {/  €  7v|(Vr)(3n  |  2n  <  x  <  2n+1  -  n)[/(x)  =  0]}.  Again  an  easy 
modification  of  the  procedure  to  Ap  identify  all  the  recursive  functions  given  in  the  previous 
section,  gives  us  a  procedure  to  Ap-identify  C  with  So  =  {x  |  (3n  |  2n  <  x  <  2n+l  —  n)}. 

Thus  C  €  DAp1.  Suppose  by  way  of  contradiction  that  IIM  M  UDApd-identifies  C.  Then 
M  can  be  easily  modified  to  UDApd  identify  all  the  recursive  functions,  contradicting 
Theorem  (2).  □ 

Now  we  show  that  even  though  all  the  recursive  function  cannot  be  DApd-identified, 
there  are  large  classes  of  functions  which  can  be  DApMdentified. 

Theorem  5  (Vi  €  .V)[BC'  C  DAp1] 

Proof:  Given  IIM  M  we  construct  another  machine  M\  which  DApCidentifies  ail  func¬ 
tions  BC:  identified  by  M.  Let  }l(n)  =  max({21''  :  2k  <  v\  1.  Let  M'(/jr‘ )  =  patch(M(/|-^n)),  /|n), 
where 

{y.  if  (3yl(x,  y)  €  content(a). 
otherwise: 

Let  n0  be  so  large  that  for  all  n  >  2r‘c,  M(/|T‘)  ='  /.  Let  errset  -  |  no  <  n  < 

[logxj  )[‘r:M{/|J")(-r)  ^  /(*)]}.  Clearly,  if  M(/|n),n  >  2n°  commits  an  error  on  x  then 
x  €  errsct.  Also  the  limiting  density  of  errset  is  0.  Theorem  follows.  □ 

6  Further  considerations  and  open  problems 

Note  that  M  in  the  proof  of  Theorem  (1)  is  clearly  immune  to  corrected  noise:  as  long  as 
M  is  eventually  apprised  of  the  correct  value  for  any  given  data  point,  it  will  still  succeed 
in  approximating  the  input  function. 
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Scientific  experiments  are  not  always  deterministic;  even  if  quantum  mechanical  inde¬ 
terminacy  is  not  important,  one  can  never  be  certain  that  all  of  the  significant  variables 
have  been  controlled.  On  the  other  hand  it  is  arguable  that  the  set  of  possible  outcomes  of 
an  experiment  is  always  finite. 

These  considerations  lead  to  a  formulation  of  inductive  inference  in  which  the  function 
to  be  learned  carries  experimental  descriptions  to  finite  sets  of  outcomes,  and  the  data  to 
the  inductive  inference  machine  consists  of  experiments  paired  with  one  outcome  at  a  time. 
It  should  be  clear  that,  even  under  these  circumstances,  the  result  of  Theorem  (1)  holds. 

In  some  respects,  these  results  are  not  very  satisfactory.  One  would  like  to  be  able  to 
give  some  account  of  the  variations  in  the  degree  of  confidence  we  have  in  the  outcomes  of 
various  experiments.  Also,  the  lower  numbered  partitions  of  the  data  are  never  completely 
predicted.  Science  does  partition  experiments  into  classes,  and  treat  each  class  to  some 
extent  separately;  but  the  classes  are  not  simply  the  arbitrary  choices  made  by  a  pairing 
function  but  reflect,  to  some  extent,  the  results  of  the  experiments. 

It  remains  open  whether  or  not  one  can  approximate  a  set  from  positive  data  only;  We 
conjecture  not. 
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